Our overall aim is to study what KIND of learning occurs in NF, so that we can examine in more detail the relationship between patient’s clinical activity and treatment outcome. This suggests two foci: (A) developing modelling methods to perform a more thorough examination of learning than any previously; (B) applying the developed models of learning to our clinical trial data to tease out the mechanisms of different protocols and modes of training.
Prior work seems to have focused on overall performance improvement, mainly so that subjects can be split into groups of learners and non-learners, to estimate the efficacy of treatment for learners only. This is a neat trick when studying NF efficacy, but not very useful to study learning itself.
In order to develop methods to examine learning, we must study performance at trial, session, and treatment level; and measure performance both in terms of magnitude and the pattern of change. To achieve this, we derive data that robustly captures (1) magnitude and (2) change for all normal and transfer trials (details below). Normal and transfer are joined in order to minimise difference in number of trials per session across treatment (which changes because inverse trials are introduced halfway).
Thus, in part A, our data is within-subjects ‘learning curves’ (LCs), defined as: performance (1) magnitude and (2) change computed from trials within each session, for all sessions.
In part B the data can be subdivided into separate protocols and training modes, to explore the details of how they differ in terms of the best-fit learning model(s) (para and non-para). The split between A and B helps distinguish task-learning from task-outcomes, even when such outcomes are measured repeatedly throughout the training, such as baseline bandpowers or repeated symptom self-reports.
Following the literature (Fitts, Posner), we investigate whether NF shows evidence of being skill acquisition, so going beyond operant conditioning. This would be important also clinically, because conditioning could be conceivably automated and made into, e.g. an ‘app’; but skill acquisition requires coaching.
To investigate learning, we must deal with the complexity of analysing learning: there are different qualities of learning captured by different sorts of analysis, e.g.
The results in draft paper “Learning Curves_v02.docx” show quantification for magnitude and plateau point, where session-wise scores were fitted with linear (growth curve) and quadratic models. These two LC concepts are the main focus of prior work on NFB learning. When studies estimate whether subjects have learned, they usually calculate gain in some way (REF). Several studies have also estimated sufficiency by looking at the number of sessions required to see a plateau in improvement (REF).
Then there is the issue of how to quantify aspects or respresentations of learning. Parametric methods include linear regression, curve fitting, and hierarchical models. These can be useful but also sensitive to violations of assumptions, which can be hard to avoid in noisy data. Non-parametric approaches can help, and we develop one based on cosine similarity between performance metrics and canonical learning curves.
First, it’s important to know about overall performance improvement for the purpose of context. If the scores of all trials in all types of training tend to increase, that tells us that learning of some kind must have happened, and we can proceed to study what kind.
It’s also interesting to check if learning has plateaued or not, by fitting a quadratic curve and checking the sign of the quadratic term.
As noted above, a linear of model of learning does not fit to learning theory nor provide much more information than that performance increased or decreased. Other curve families have been used to describe learning with better empirical support. Power law curves were long thought to best describe learning due to practice (Newell, Rosenbloom, 1980). However, other curve families have been argued to fit better to non-averaged (individual) curves: e.g. exponential (Heathcote et al, 1999). Further, if performance conforms to a multi-stage profile, e.g. as the three stage model for motor skill learning (Fitts, Posner), then a piecewise power law model can fit better. And the type of task-reward that the data come from can also affect, e.g. in success-only tasks, a sigmoid curve can fit the data (Leibowitz et al 2010), and notably a sigmoid arguably consists of three phases, relating to Fitts-Posner.
These parametric approaches have some issues: violations of their assumptions, such as outliers, can be hard to avoid in noisy data. Some fitted models are very sensitive to small changes in the data: outliers can change the linear-fit slope or the shape of a fitted curve by significant amounts. Also (perhaps most importantly), treatment-level curve-fitting is blind to intra-session patterns. Thus it is valuable to also look at LCs that are model-free, i.e. non-parametric, and can account flexibly for intra-session variability. We develop non-parametric LCs to provide clear and easy-to-interpret models of learning that are easy to adjust to diverse theories (i.e. cosine similarity).
When the best fitting learning model(s) are established, we can use them to explore the real complexity of the CENT NF data, including TB and SMR protocols, and normal, inverse, transfer training modes. Each of the protocols have theoretical grounding in cortical arousal and motoric activation regulation. Thus, we can also relate these sub-groups to the baseline bandpowers per session, and the pre-test baseline vigilance analysis and per-session sleep self-reports.
Learning in protocols
Learning in training modes
Learning related to session-wise baseline bandpowers
Learning related to vigilance baseline and sleep self-reports
Our analysis approach has two parts: (A) investigate models of learning in NF data to find best-fit LCs; and (B) use the derived LCs to discover group-wise patterns in the various background and outcome variables available, in a within-subjects manner.
In the rest of the report, we will step through the methods for creating all LCs, to see how they work. We will then explore the relationships with background and outcome variables. First, we describe the data used.
We work primarily from the following datasets (available in shared Dropbox folder or on request):
The raw data is saved as ‘tr.raw’. To create a clean dataset ‘tr.blk’, we filter trials to remove the first session (as it was a training session), session 41 (completed by only 1 patient), trials with score = 0, and trials marked bad by trainers.
## [1] "Data: tr.raw$adj_score"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.25 10.17 15.59 16.68 22.25 87.24
## [1] "... subset: TB"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.20 12.29 19.93 20.34 27.26 87.24
## [1] "... subset: SMR"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.25 9.53 14.22 14.88 19.52 68.90
## [1] "Data: tr.blk$adj_score"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 12.63 17.06 19.01 23.48 72.96
## [1] "... subset: TB"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.16 16.95 22.00 23.70 29.09 72.96
## [1] "... subset: SMR"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 11.71 15.37 16.77 20.10 68.90
We also subset scores according to the three training modes (normal, inverse, transfer), and the complement set (normal+transfer = not inverse). So there are five possible (clean) datasets:
To proceed with Part A, we consider only not-inverse trials.
## [1] "Data: tr.not0$adj_score"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 12.26 16.16 18.04 21.45 72.96
## [1] "... subset: TB"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.73 17.99 22.91 24.97 30.16 72.96
## [1] "... subset: SMR"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 11.14 14.30 14.74 17.60 68.90
We come across our first significant problem. For calculating trial-wise correlations, we require 2 or more trials per session. Many sessions contain only one trial of a certain type, especially for transfer trials. We therefore create datasets that prune out the sessions with trials = 1. We also face a constraint when we calculate the cosine similarity of trial-wise correlations with a hypothetical multi-phase learning curve: the number of sessions should be enough to accommodate the definition of the hypothetical curve (e.g. 3+ for Fitts’ model).
## [1] "Data: tr.not$adj_score"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 12.26 16.16 18.04 21.45 72.96
## [1] "... subset: TB"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.73 17.99 22.87 24.96 30.16 72.96
## [1] "... subset: SMR"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 11.14 14.31 14.74 17.62 68.90
We try to describe two different aspects of learning on a per-session basis: (1) Magnitude - derived as score mean * We use outlier-resistant geometric mean of trial-wise scores per session. For comparison here, I also show the median-derived session-wise scores. Both might be fine for calculating session LCs because both are robust to outliers. However for NFB, because we can’t assume any model for performance because we don’t theoretically know how it happens, we don’t want to totally reject outliers: they might represent something important. Thus, for this report we use the geometric mean. We center and scale the mean to lie from -1 to 1. (2) Consistency - derived as score monotonicity * We use rank-order correlation of per-trial adjusted score with trial-order. We use Kendall rather than Spearman for rank correlation as it is recommended for low N - the interpretation remains the same. The outcome range is -1 to 1, where monotonic increase in score per session results in Kendall t=1, and monotonic decrease results in Kendall t=-1. Presumably, monotonic increase in performance scores is a positive sign for learning.
Group-descriptive stats and plots per subject follow below:
## [1] "GEOMETRIC MEAN ADJ SCORE:"
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 39 17.87 2.2 17.75 17.92 2.48 13.59 22.45 8.86 -0.12 -0.87
## se
## X1 0.35
## [1] "Data: sn.not$adj_score"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.89 12.82 16.14 18.31 21.34 55.64
## [1] "... subset: TB"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.10 19.25 23.01 25.30 30.04 55.64
## [1] "... subset: SMR"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.89 11.69 14.27 14.60 16.79 35.83
## [1] "KENDALL CORRELATIONS:"
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 39 0.11 0.14 0.12 0.11 0.11 -0.2 0.43 0.63 0.33 -0.07
## se
## X1 0.02
We will make a series of linear mixed-effects models: * Unconditioned: no factor for sessions, this simply models the intercept per subject * Fixed linear, random intercept: we allow the intercept to be random but fix a single group-wide slope * Random linear, random intercept: we allow the slope to be random per subject
The idea is that we can measure the improvement in model-fit as we make the models more realistic, and visualise the fit in terms of rediduals. Thus, below we see summaries for each fitted model, and two plots. First, linear models are plotted for each subject (black), overlaid by the ‘prototype’ function for the model, i.e. Intercept + βx. Next, the residuals are plotted, showing how much variance is not accounted for by the model.
## Linear mixed-effects model fit by REML
## Data: scorelong
## AIC BIC logLik
## 5378.62 5392.834 -2686.31
##
## Random effects:
## Formula: ~1 | patient
## (Intercept) Residual
## StdDev: 5.792983 5.536973
##
## Fixed effects: adj_score ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 17.95138 1.22288 822 14.67959 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.13120599 -0.55418306 -0.03893965 0.48498069 5.06274016
##
## Number of Observations: 845
## Number of Groups: 23
## Linear mixed-effects model fit by REML
## Data: scorelong
## AIC BIC logLik
## 5306.978 5325.925 -2649.489
##
## Random effects:
## Formula: ~1 | patient
## (Intercept) Residual
## StdDev: 5.830495 5.276632
##
## Fixed effects: adj_score ~ 1 + session
## Value Std.Error DF t-value p-value
## (Intercept) 15.002825 1.2707104 821 11.806644 0
## session 0.156259 0.0170609 821 9.158915 0
## Correlation:
## (Intr)
## session -0.253
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.04661902 -0.60865209 -0.05272864 0.49917681 4.95880851
##
## Number of Observations: 845
## Number of Groups: 23
## Linear mixed-effects model fit by REML
## Data: scorelong
## AIC BIC logLik
## 5257.384 5285.806 -2622.692
##
## Random effects:
## Formula: ~1 + session | patient
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.2472191 (Intr)
## session 0.1514109 0.341
## Residual 5.0373681
##
## Fixed effects: adj_score ~ 1 + session
## Value Std.Error DF t-value p-value
## (Intercept) 14.933067 0.9536819 821 15.658331 0
## session 0.161691 0.0355815 821 4.544263 0
## Correlation:
## (Intr)
## session 0.132
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.56870595 -0.56502094 -0.05054914 0.48115477 5.21577503
##
## Number of Observations: 845
## Number of Groups: 23
We can then test the various models by ANOVA, to check the significance of model fit differences.
## Model df AIC BIC logLik Test L.Ratio p-value
## um.fit 1 3 5378.620 5392.834 -2686.310
## fl.ri.fit 2 4 5306.978 5325.925 -2649.489 1 vs 2 73.64206 <.0001
## Model df AIC BIC logLik Test L.Ratio p-value
## fl.ri.fit 1 4 5306.978 5325.925 -2649.489
## rl.ri.fit 2 6 5257.384 5285.806 -2622.692 1 vs 2 53.59343 <.0001
First, we will visualise this concept with a quadratic function fitted to data for each subject: y = β.session² + β.session + E
The plot is sorted from top left by protocol and gender. The second-order coefficient of this quadratic function, i.e. β.session², expresses the concept of plateau in the data in its sign. The sign is positive if the curve is concave (bending up at the ends, u-shaped), or negative if convex (bending down at the ends, n-shaped). A convex curve has a plateau, a concave doesn’t.
We can compare the 2nd-order coefficient sign of each subject with their session-coefficient from the random slopes and intercepts growth model above, which captures the degree of learning. This is done by Pearson correlation, reported below.
##
## Pearson's product-moment correlation
##
## data: LINEAR_LEARNING_COEF and QUADRATIC_SIGN
## t = -2.2365, df = 21, p-value = 0.0363
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.72053273 -0.03221834
## sample estimates:
## cor
## -0.4385958
Next, we extend the growth modelling approach by adding a fixed effect for the square of the session. We visualise this model in the same way as before.
## Linear mixed-effects model fit by REML
## Data: scorelong
## AIC BIC logLik
## 5258.891 5292.042 -2622.446
##
## Random effects:
## Formula: ~1 + session | patient
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.2710967 (Intr)
## session 0.1489576 0.341
## Residual 5.0060135
##
## Fixed effects: adj_score ~ 1 + session + I(session^2)
## Value Std.Error DF t-value p-value
## (Intercept) 13.531318 1.0429672 820 12.973867 0e+00
## session 0.377939 0.0727940 820 5.191905 0e+00
## I(session^2) -0.005719 0.0016867 820 -3.390758 7e-04
## Correlation:
## (Intr) sessin
## session -0.289
## I(session^2) 0.396 -0.876
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.43071866 -0.57358013 -0.06376315 0.47217879 5.27498571
##
## Number of Observations: 845
## Number of Groups: 23
We can also test the quadratic fit growth model against the earlier linear model using ANOVA.
## Model df AIC BIC logLik Test L.Ratio p-value
## rl.ri.fit 1 6 5257.384 5285.806 -2622.692
## rq.ri.fit 2 7 5258.891 5292.042 -2622.446 1 vs 2 0.4929621 0.4826
We can view the outcome of fitting a power law curve to our data by examining a linear fit in the log-log transformation space: i.e. the log-transform of both data dimensions (score & session). We can further fit a linear growth model to log-log data to find the quality of fit of a power law. For the exponential curve, we simply repeat the process in log-linear space, i.e. log transform score but leave session as is.
## Linear mixed-effects model fit by REML
## Data: scorelong
## AIC BIC logLik
## 485.7556 514.1774 -236.8778
##
## Random effects:
## Formula: ~1 + log(session) | patient
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.36505767 (Intr)
## log(session) 0.08067801 -0.548
## Residual 0.29884820
##
## Fixed effects: log(adj_score) ~ 1 + log(session)
## Value Std.Error DF t-value p-value
## (Intercept) 2.4680460 0.08328934 821 29.632194 0
## log(session) 0.1224252 0.02068483 821 5.918595 0
## Correlation:
## (Intr)
## log(session) -0.632
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -7.27376238 -0.48760915 0.05495429 0.60067069 3.63167906
##
## Number of Observations: 845
## Number of Groups: 23
## Linear mixed-effects model fit by REML
## Data: scorelong
## AIC BIC logLik
## 490.263 518.6848 -239.1315
##
## Random effects:
## Formula: ~1 + session | patient
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.325712158 (Intr)
## session 0.007295471 -0.346
## Residual 0.298101414
##
## Fixed effects: log(adj_score) ~ 1 + session
## Value Std.Error DF t-value p-value
## (Intercept) 2.614301 0.07107060 821 36.78456 0
## session 0.009673 0.00180475 821 5.35972 0
## Correlation:
## (Intr)
## session -0.417
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -7.15317963 -0.49192923 0.07123171 0.58319751 3.60329584
##
## Number of Observations: 845
## Number of Groups: 23
We then compare the fit quality of these models; it doesn’t make sense using ANOVA, because the fixed effects change, but we can observe the differences in model fit indices AIC or BIC. In the same way, we can also compare the best fitting curve to the best linear model, above.
## Random slope+intercept linear model:
## sigma logLik AIC BIC deviance
## 1 5.037368 -2622.692 5257.384 5285.806 NA
## Random slope+intercept log-log model:
## sigma logLik AIC BIC deviance
## 1 0.2988482 -236.8778 485.7556 514.1774 NA
## Random slope+intercept log-linear model:
## sigma logLik AIC BIC deviance
## 1 0.2981014 -239.1315 490.263 518.6848 NA
Given LCs based on the two types of session-performance index, magnitude and consistency, we want to establish if they display a pattern that matches skill acquisition theory. The hypothetical skill acquisition learning curve follows the Fitts-Posner three stage model (REF - see Edua’s theory text 07.06.2017).
We fit our data to this model by taking the cosine similarity of each subject’s LC with a canonical LC that represents Fitts’ model: this is our ideal LC. We can test if the resulting distribution mean differs from 0 for the group, in order to determine if this model captures the learning that we know has occured.
We also want a comparison model, to determine whether our Fitts model fits the data better than some simpler explanation. Cosine similarity ranges from 1 to -1, as does correlation. Thus we can use Kendall correlation across sessions for score magnitude, to derive a monotonic LC. We can do similarly for consistency, by aggregating the per-session correlations to derive a consistent LC (using Hunter-Schmidt method, see Zhang & Wang (2014), Multivariate Behavioral Research, 49:2, 130-148).
Fitts’ model has three phases. Our choice of Fitts’ model is initially (0, 1, 0.5). The choice of model impacts the testing outcomes very strongly and will have to be explored in more detail later.
## [1] "Fitt's model for monotone improvement = 0, 1, 0.5"
## [1] "Fitt's model for geometric mean score = -0.5, 0, 0.5"
“MonoLC.not” = monotonic LC (correlation across session means) for NOT-INVERSE trials data:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.18413 0.03821 0.25042 0.19629 0.38310 0.47368
“ConsLC.not” = consistency LC (aggregate of session-wise correlations) for NOT-INVERSE trials data:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.11473 -0.01019 0.10126 0.09519 0.18037 0.31624
“IscrLC.not” = per-session geometric mean score cosine similarity LC for NOT-INVERSE trials data:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.100746 -0.004049 0.061513 0.054777 0.129322 0.182141
“IcorLC.not” = per-session correlations cosine similarity LC for NOT-INVERSE trials data:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.29381 0.05013 0.17940 0.16393 0.36723 0.48241
It is useful to look at the LC data in raw form. The consistency, monotonic and ideal LCs are real-valued, have the same range [-1..1], and similar interpretation (-1..1 = distance from perfect performance according to the model being used). Thus, we examine the four LCs side-by-side in the same plot.
## No id variables; using all as measure variables
We can also sort the main LCs and view scatterplots to get a sense of how clustered the results are.
We can explore how the LCs relate to each other using a correlation matrix plot. Variables with their ranges lie on the diagonal. Over the diagonal are correlations of variable-pairs, and confidence intervals (in parentheses). Under the diagonal are loess-curve fits to the scatter plots of variable-pairs.
First, we want to know whether each LC distribution was significantly biased, i.e. does the LC represent a consistent pattern of change across the group (do they improve)?
The monotonic and consistency LCs are based on calculations of Kendall’s tau. Under the null hypothesis of independence of X and Y, the sampling distribution of tau has an expected value of zero. Thus, group-wise null hypothesis can be that the distribution of LCs has a mean value close to zero. Because we have a small sample, we will not assume that the LC has a normal distribution, but instead use Wilcoxon’s non-parametric test.
Similarly using Wilcoxon’s non-parametric test, we also test whether the group-wise Ideal LC distribution was significantly biased, i.e. do they learn a skill?
##
## Wilcoxon signed rank test
##
## data: MonoLC.not[, 2]
## V = 247, p-value = 0.0004079
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
## 0.1047619 0.3114846
## sample estimates:
## (pseudo)median
## 0.2075842
##
## Wilcoxon signed rank test
##
## data: ConsLC.not[, 2]
## V = 234, p-value = 0.002416
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
## 0.0373236 0.1514861
## sample estimates:
## (pseudo)median
## 0.09796444
##
## Wilcoxon signed rank test
##
## data: IscrLC.not[, 2]
## V = 225, p-value = 0.006711
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
## 0.01751030 0.09174911
## sample estimates:
## (pseudo)median
## 0.05616349
##
## Wilcoxon signed rank test
##
## data: IcorLC.not[, 2]
## V = 227, p-value = 0.005414
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
## 0.06977147 0.27633928
## sample estimates:
## (pseudo)median
## 0.168556
Finally, the idea of including monotonic and consistency LCs is that they provide a simple base case to compare the Ideal LCs. We do this by comparing how well they fit the data. Since all LCs share the same distribution ranges, they can all be subjected to the same logic: i.e. a perfect relation between model and data would give a score of ±1, and an orthogonal relation gives 0. Thus the model fit can be calculated simply by Σ₁ⁿ 1 - |xₙ| / n, i.e. the average difference of absolute values from 1. As usual with fit indices, lower is better!
## MonoLC model fit: 0.7472654
## ConsLC model fit: 0.8714739
## IdealLC.score model fit: 0.9196877
## IdealLC.correl model fit: 0.7543331
For RQ1, we see that the random slopes and intercepts model has the best fit, significant by ANOVA at p<.0001. The group-level coefficient of session is ~0.16, giving a total raise of ~6.08 over the measured 38 sessions, significant at p<0.00001.
For RQ2, we see that most subjects (n=15) have a plateau, i.e. quadratic is convex. The minority with no plateau (concave curves, n=8) is big enough to be meainingful though. The Pearson’s correlation between them is r=-0.44, significant at p<0.05. Since negative signs indicate convex/plateaued curves, the negative correlation indicates that subjects who learned more had a plateau.
Interestingly, the quadratic growth model doesn’t capture the concavity of some data (all individual curves are convex in the plot); it also doesn’t fit the data any better than the linear model. Thus it is probably the wrong way to approach this question.
The subject-wise plots show that a power law AND an exponential curve fit very well to the data (data are almost linear in the transform space), for quite a few subjects, but the fit is not good for a substantial minority. The comparison of fitting indices shows that the two curve families are relatively equal: power law/log-log is slightly better than exponential/log-linear. However both are an order of magnitude better than the linear model, showing that even if these curves do not fit perfectly for everyone (motivating the non-parametric approach), the majority pattern is that subjects learn in a classic ‘power-law’ way.
There quite a few results here: * The boxplot and Wilcoxon tests show that all LCs capture some kind of positive relationship: all are significantly different to zero. * The correlation matrix indicates that LCs that are based on the same data (session-wise correlations or mean scores) are quite highly correlated (>0.7). * The scatter plots show that each LC is distributed quite uniformly across the range: no strong clustering. * The fit values are all quite poor, but especially for ConsLC (0.87) and IdealLC.score (0.92). * Finally, the comparison of LCs by fit indicates: (a) IdealLC.score is a poorly chosen model compared with MonoLC; and (b) IdealLC.correl improves over the base model ConsLC.
RQ1-3 seem like good and complete analyses, from which insights and reporting can be drawn. RQ4 still seems to lack a major insight: it would be great to try improving the model fitting scores by optimisation; however this might not be possible in the near future.
TODO: describe data for Part B RQs
For each dataset we create a new index of sessions so that rows can be aligned according to how many sessions of a protocol were conducted, as opposed to what the session number was when starting. Thus, e.g. transfer trials are indexed from 1 to 10 for all subjects, regardless of what session it was that their transfer trials really started (first transfer trial ranged from session 28 to session 35, depending on subject).
…
However, pruning out sessions with 1 trial results in losing quite a lot of data.
…
The total number of sessions per subject for each training mode now varies quite a bit: this is no problem because our LC calculation methods are not sensitive to small differences in N, except when N is very small (see more below). However, for the transfer training mode, removing sessions with 1 trial results in losing entire subjects. For this reason, because parametric or session-wise LCs are measuring a different thing to trial-wise correlation-based LCs, we can use different datasets for each: for session LCs, we will include sessions with 1 trial.
…
We can further subdivide data by subjects in TB protocol and subjects in SMR protocol…